Use CTRL/CMD + Shift + k to preview your markdown. Hit
the Visual button, or use
CTRL/CMD + Shift + F4 to switch to visual mode, which will
let you edit the formatted version in real time.
At the top of the .Rmd file is the YAML (Yet Another
Markup Language) header. It is a human-readable data serialization
language. It sets some options for your markdown and gives you a nicely
formatted preamble. It is currently arranged with some of my preferred
settings, but feel free to play around with this and make it your
own.
We have it set here to output as html, but you can just as easily produce PDF or Word documents. Note that some outputs may come out differently (or not at all) when we render to different formats. There are a bunch of built-in themes that you can explore here.
Below the YAML header in the .Rmd document you will find
the first “code chunk”. You will notice that this one does not appear in
the rendered document - this is because it is the setup
chunk, and it has the setting include=FALSE. We use this
setup chunk to set options for chunk behavior, as well as loading
packages and data and such. Note that the markdown will run in a totally
separate environment, so you have to load all your data and packages
within the .Rmd file.
The # above makes a header. A single # is
the largest header, and extra #s are smaller headers.
This header is automatically numbered because of the YAML settings
and the the double #.
This is three #s.
This is four #s. Note that it does not show up in the
table of contents because we only asked it to keep track of the first
three levels.
Use a single asterisk to make font italic.
Use double asterisks to make font bold.
Note that you need a blank line between paragraphs to split up text. Starting on a new line is not enough.
To make bullet points, use -
To make numbered lists, use 1.
To put code in-line, use back ticks (``)
For multiple lines of verbatim code, use triple back ticks.
x + 1 = y
To make block quotes, use
>at the start of the line.
Here we will explore some proper code chunks. You can use
CTRL/CMD + ALT + I to create a new chunk. After the
r comes the chunk name. This is not required, but is
convenient if we hit an error because it will tell us the name of the
chunk where the error was. Otherwise, it will just say “error in chunk
14” or some such.
We will be using data from Schneider et al. 2024. The code and data are available on a GitHub repository. While we’re at it, all the scripts and data for this course are available in a repository as well. Let’s start by cleaning up our data a little bit in this first code chunk:
# Remove ampersands
fsci$FSCI_region <- gsub('&', 'and', fsci$FSCI_region)
# Reduce to one variable
df <- fsci[fsci$short_label == 'Prevalence of undernourishment', ]
If we want to run our code but not show the code block, we can set
the echo=FALSEoption in the chunk header. Otherwise, our
code chunk will be visible. Let’s show off our example regression from
the FSCI paper.
##
## Call:
## lm(formula = normvalue ~ year + FSCI_region, data = df, weights = weight)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -8790.7 -724.6 -96.6 825.2 9701.4
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 982.84949 71.45042 13.756
## year -0.48328 0.03555 -13.593
## FSCI_regionEastern Asia -5.96480 2.32473 -2.566
## FSCI_regionLatin America and Caribbean -3.30360 2.34519 -1.409
## FSCI_regionNorthern Africa and Western Asia -1.39026 2.38571 -0.583
## FSCI_regionNorthern America and Europe -9.92333 2.89919 -3.423
## FSCI_regionOceania 20.06159 5.13223 3.909
## FSCI_regionSouth-eastern Asia 1.88444 2.33252 0.808
## FSCI_regionSouthern Asia 6.89466 2.28244 3.021
## FSCI_regionSub-Saharan Africa 15.69689 2.31040 6.794
## Pr(>|t|)
## (Intercept) < 0.0000000000000002 ***
## year < 0.0000000000000002 ***
## FSCI_regionEastern Asia 0.01035 *
## FSCI_regionLatin America and Caribbean 0.15906
## FSCI_regionNorthern Africa and Western Asia 0.56012
## FSCI_regionNorthern America and Europe 0.00063 ***
## FSCI_regionOceania 0.0000951814320 ***
## FSCI_regionSouth-eastern Asia 0.41923
## FSCI_regionSouthern Asia 0.00255 **
## FSCI_regionSub-Saharan Africa 0.0000000000136 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2094 on 2484 degrees of freedom
## Multiple R-squared: 0.3246, Adjusted R-squared: 0.3221
## F-statistic: 132.6 on 9 and 2484 DF, p-value: < 0.00000000000000022
This shows our output, but not the code chunk.
We can see our regression output much like we do when we run it in a script, but it is not terribly nice to look at here.
To get a cleaner output, we can convert our regression results to a
data frame, then use knitr::kable() to create a nice
looking table.
lm_df <- broom::tidy(lm)
knitr::kable(lm_df)
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 982.8494947 | 71.4504180 | 13.7556857 | 0.0000000 |
| year | -0.4832786 | 0.0355529 | -13.5932126 | 0.0000000 |
| FSCI_regionEastern Asia | -5.9647972 | 2.3247336 | -2.5657982 | 0.0103520 |
| FSCI_regionLatin America and Caribbean | -3.3036003 | 2.3451881 | -1.4086718 | 0.1590574 |
| FSCI_regionNorthern Africa and Western Asia | -1.3902648 | 2.3857103 | -0.5827467 | 0.5601167 |
| FSCI_regionNorthern America and Europe | -9.9233258 | 2.8991908 | -3.4227915 | 0.0006299 |
| FSCI_regionOceania | 20.0615880 | 5.1322298 | 3.9089419 | 0.0000952 |
| FSCI_regionSouth-eastern Asia | 1.8844396 | 2.3325248 | 0.8078969 | 0.4192273 |
| FSCI_regionSouthern Asia | 6.8946569 | 2.2824376 | 3.0207428 | 0.0025473 |
| FSCI_regionSub-Saharan Africa | 15.6968937 | 2.3103971 | 6.7940243 | 0.0000000 |
We can take some extra steps to get the column names capitalized and the numbers rounded:
lm_df_cleaner <- lm_df %>%
dplyr::mutate(across(where(is.numeric), ~ round(.x, 3))) %>%
setNames(c(snakecase::to_title_case(names(.))))
knitr::kable(lm_df_cleaner)
| Term | Estimate | Std Error | Statistic | P Value |
|---|---|---|---|---|
| (Intercept) | 982.849 | 71.450 | 13.756 | 0.000 |
| year | -0.483 | 0.036 | -13.593 | 0.000 |
| FSCI_regionEastern Asia | -5.965 | 2.325 | -2.566 | 0.010 |
| FSCI_regionLatin America and Caribbean | -3.304 | 2.345 | -1.409 | 0.159 |
| FSCI_regionNorthern Africa and Western Asia | -1.390 | 2.386 | -0.583 | 0.560 |
| FSCI_regionNorthern America and Europe | -9.923 | 2.899 | -3.423 | 0.001 |
| FSCI_regionOceania | 20.062 | 5.132 | 3.909 | 0.000 |
| FSCI_regionSouth-eastern Asia | 1.884 | 2.333 | 0.808 | 0.419 |
| FSCI_regionSouthern Asia | 6.895 | 2.282 | 3.021 | 0.003 |
| FSCI_regionSub-Saharan Africa | 15.697 | 2.310 | 6.794 | 0.000 |
There are many more options available in the kable and
kableExtra packages for building static tables. This is
probably the most powerful set of table packages I’ve found. See the docs
for examples. This is where you really learn how to use a package.
It is written by the author, with abundant vignettes and examples.
For a very clean regression table with less work, try the
sjPlot package:
sjPlot::tab_model(
lm,
p.style = 'numeric',
digits = 3,
show.se = TRUE,
robust = TRUE,
show.reflvl = TRUE,
dv.labels = 'Undernourishment',
pred.labels = gsub("FSCI_region", "", names(coef(lm)))
)
| Undernourishment | ||||
|---|---|---|---|---|
| Predictors | Estimates | std. Error | CI | p |
| (Intercept) | 982.849 | 115.808 | 755.760 – 1209.939 | <0.001 |
| year | -0.483 | 0.058 | -0.596 – -0.370 | <0.001 |
| Eastern Asia | -5.965 | 1.695 | -9.289 – -2.641 | <0.001 |
| Latin America and Caribbean | -3.304 | 1.516 | -6.276 – -0.332 | 0.029 |
| Northern Africa and Western Asia | -1.390 | 1.616 | -4.559 – 1.779 | 0.390 |
| Northern America and Europe | -9.923 | 1.721 | -13.297 – -6.549 | <0.001 |
| Oceania | 20.062 | 1.953 | 16.232 – 23.891 | <0.001 |
| South-eastern Asia | 1.884 | 1.503 | -1.063 – 4.832 | 0.210 |
| Southern Asia | 6.895 | 1.502 | 3.949 – 9.840 | <0.001 |
| Sub-Saharan Africa | 15.697 | 1.712 | 12.340 – 19.054 | <0.001 |
| Observations | 2494 | |||
| R2 / R2 adjusted | 0.325 / 0.322 | |||
Note that this function takes the lm object as an input,
not a data frame. It is designed to work with regression models and
provides a ton of options for displaying them. Check out the
documentation here.
A curious hiccup with this package is that the
show.fstat argument does not work. If you want to see why,
check out the code behind the function. You can do this either by
placing the cursor on the function and hitting F2 or by
using CTRL/CMD + left click on the function.
The stargazer package is quite popular in econometrics.
You can find a nice tutorial here, or a
quick paper and demo arguing why you should use it from the author here.
It’s a pretty nice package for easily displaying regressions in LaTeX,
but I wouldn’t personally recommend it for non-LaTeX applications.
Here we will put two models together in the same table:
# Get another regression
df2 <- fsci[fsci$short_label == 'Access to safe water', ]
lm2 <- lm(normvalue ~ year + FSCI_region, data = df2, weights = weight)
stargazer::stargazer(
lm,
lm2,
type = 'html',
font.size = 'footnotesize',
column.labels = c('Undernourishment', 'Safe Water'),
dep.var.labels.include = FALSE,
covariate.labels = gsub("FSCI_region", "", names(coef(lm)))
)
| Dependent variable: | ||
| Undernourishment | Safe Water | |
| (1) | (2) | |
| (Intercept) | -0.483*** | 0.683*** |
| (0.036) | (0.038) | |
| year | -5.965** | 33.607*** |
| (2.325) | (2.514) | |
| Eastern Asia | -3.304 | -8.328*** |
| (2.345) | (2.623) | |
| Latin America and Caribbean | -1.390 | -5.368* |
| (2.386) | (2.910) | |
| Northern Africa and Western Asia | -9.923*** | 33.053*** |
| (2.899) | (2.536) | |
| Northern America and Europe | 20.062*** | 35.413*** |
| (5.132) | (4.446) | |
| Oceania | 1.884 | -26.648*** |
| (2.333) | (2.606) | |
| South-eastern Asia | 6.895*** | -12.534*** |
| (2.282) | (2.508) | |
| Southern Asia | 15.697*** | -40.530*** |
| (2.310) | (2.582) | |
| Sub-Saharan Africa | 982.849*** | -1,315.970*** |
| (71.450) | (75.950) | |
| Observations | 2,494 | 3,106 |
| R2 | 0.325 | 0.796 |
| Adjusted R2 | 0.322 | 0.795 |
| Residual Std. Error | 2,094.342 (df = 2484) | 2,995.163 (df = 3096) |
| F Statistic | 132.634*** (df = 9; 2484) | 1,342.004*** (df = 9; 3096) |
| Note: | p<0.1; p<0.05; p<0.01 | |
Note that we put the results='asis' option into the
chunk header. This is how we can get latex to show up properly as an
html markdown.
We’ve already seen how to make tables above. For static tables,
knitr::kable() is a good choice.
For interactive tables, there are a couple of different options.
The DT package is a classic choice for interactive
tables. Note that we are setting echo=FALSE here, so the
code chunk will not be visible.
My personal favorite for interactive tables is
reactable. The documentation is
excellent, so check it out if you’re interested.
reactable::reactable(
data = gapminder,
filterable = TRUE,
searchable = TRUE,
outlined = TRUE,
bordered = TRUE,
compact = TRUE,
striped = TRUE,
showPageSizeOptions = TRUE
)
I find the options for customization here much more intuitive than
DT, and the documentation is much easier to use.
An excellent reference for graphs is the R Graph Gallery, which has lots of examples to explore and accompanying code for each figure.
We haven’t really covered plots, but you really just throw your code in the chunk and it will appear.
Base plots with the plot() function are available in
base R. It can do just about anything. I personally find that it works
great for simple plots, but more elaborate and pretty plots take more
work.
# Filter gapminder to the year 2007 only
gapminder_2007 <- gapminder[gapminder$year == 2007, ]
# Plot gapminder data
plot(
x = gapminder_2007$gdpPercap,
y = gapminder_2007$lifeExp,
col = gapminder_2007$continent,
pch = 16,
cex = sqrt(gapminder_2007$pop) / 10000,
ylab = 'Life Expectancy',
xlab = 'GDP per Capita',
main = 'Life Expectancy against GDP per Capita (2007)'
)
# Add a legend to the plot above
legend(
"bottomright",
legend = levels(gapminder_2007$continent),
col = 1:5,
pch = 16,
title = "Continent"
)
The ggplot2 package is one of the biggest strengths of R
in my opinion. It is an excellent package for making pretty plots
easily, with tons of extensions and extra packages for applications in
mapping, chord diagrams, dendrograms, animations, etc. For a nice
gallery of R graphs including example code, check out the R Graph Gallery.
# Save this plot to an object so we can use it again later
gapminder_static <- gapminder %>%
dplyr::filter(year == 2007) %>%
ggplot2::ggplot(aes(x = gdpPercap, y = lifeExp, color = continent, size = pop)) +
geom_point() +
theme_classic() +
labs(
x = 'GDP per Capita',
y = 'Life Expectancy',
title = 'Life Expectancy against GDP per Capita (2007)'
)
# Show plot created above
gapminder_static
We can change the alignment, size, and resolution of our plot in the chunk options:
# Show same plot from last chunk but with different settings
gapminder_static
Caption Goes Here
What about an interactive plot? We can use the very popular
plotly package to do this. It is native to python, but the
plotly R package gives us an easy way to access it. It has
its own syntax, but you can also use the ggplotly()
function to convert a ggplot object to a plotly object.
# This time we'll save the plot to an object that we can call later
gapminder_interactive <- gapminder %>%
dplyr::filter(year == 2007) %>%
ggplot2::ggplot(aes(
x = gdpPercap,
y = lifeExp,
color = continent,
size = pop,
text = paste0(
'Country: ', country, '\n',
'Continent: ', continent, '\n',
'GDP per capita: $',
stringr::str_squish(format(round(gdpPercap, 0), big.mark = ',')), '\n',
'Life Exp: ', round(lifeExp, 1), ' years\n',
'Population: ', format(pop, big.mark = ',')
)
)) +
geom_point() +
theme_classic() +
labs(
x = 'GDP per Capita',
y = 'Life Expectancy',
title = 'Life Expectancy against GDP per Capita (2007)'
)
# Use ggplotly function on the plot object we made above
plotly::ggplotly(gapminder_interactive, tooltip = 'text')
Note that you can hover over points to see more information from out
text field, and also move, zoom, select, and download a
static image of the plot.
Great quick references:
If you want to dive deeper: